Spectral voice conversion based on unsupervised clustering of acoustic space
نویسنده
چکیده
Voice conversion systems aim at modifying a source speaker’s speech so that it is perceived as if a target speaker had spoken it. Applying voice conversion techniques to a concatenative text-to-speech synthesizer allows for the personification of such systems, so that additional voices from a single source-speaker database can be produced quickly and automatically. This paper presents a new algorithm in which an effective and simple solution to the problem of voice conversion is suggested with the goal of maintaining high speech quality. Here, spectral conversion is performed by locally linear transformations, where the minimum mean square estimation (MMSE) method is used to compute the transformations. The acoustic features included in the conversion are vocal tract parameters, which are represented by log area ratio coefficients. Evaluation by listening tests shows that the proposed algorithm makes it possible to convert speaker individuality while maintaining high quality.
منابع مشابه
Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملSpeaker adaptation of an acoustic-articulatory inversion model using cascaded Gaussian mixture regressions
The article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation trai...
متن کاملSpeaker adaptation of an acoustic-to-articulatory inversion model using cascaded Gaussian mixture regressions
The article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation trai...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملProsodic and Spectral iVectors for Expressive Speech Synthesis
This work presents a study on the suitability of prosodic and acoustic features, with a special focus on i-vectors, in expressive speech analysis and synthesis. For each utterance of two different databases, a laboratory recorded emotional acted speech, and an audiobook, several prosodic and acoustic features are extracted. Among them, i-vectors are built not only on the MFCC base, but also on ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000